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Abstract 

Background: Recent chromatin immunoprecipitation (ChIP) experiments in fly, mouse, and human have revealed 
the existence of high-occupancy target (HOT) regions or "hotspots" that show enrichment across many assayed 
DNA-binding proteins. Similar co-enrichment observed in yeast so far has been treated as artifactual, and has not 
been fully characterized. 

Results: Here we reanalyze ChIP data from both array-based and sequencing-based experiments to show that in 
the yeast S. cerevisiae, the collective enrichment phenomenon is strongly associated with proximity to noncoding 
RNA genes and with nucleosome depletion. DNA sequence motifs that confer binding affinity for the proteins are 
largely absent from these hotspots, suggesting that protein-protein interactions play a prominent role. The hotspots 
are condition-specific, suggesting that they reflect a chromatin state or protein state, and are not a static feature of 
underlying sequence. Additionally, only a subset of all assayed factors is associated with these loci, suggesting that the 
co-enrichment cannot be simply explained by a chromatin state that is universally more prone to immunoprecipitation. 

Conclusions: Together our results suggest that the co-enrichment patterns observed in yeast represent transcription 
factor co-occupancy. More generally, they make clear that great caution must be used when interpreting ChIP 
enrichment profiles for individual factors in isolation, as they will include factor-specific as well as collective contributions. 
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Background 

In addition to mapping canonical transcription factor 
(TF) binding sites, chromatin immunoprecipitation (ChIP) 
experiments have revealed genomic loci at which many 
DNA-binding proteins display a signal of enrichment 
despite the absense of an in vitro binding site in the 
underlying DNA sequence. These regions have been al- 
ternatively called "TF colocalization hotspots" [1] and 
"high-occupancy target (HOT) regions" [2]. Their ex- 
istence was first demonstrated in a study profiling 
seven Drosophila melanogaster TFs with diverse func- 
tions using the DamID method in cultured embryonic 
cells [1]. In that study, DNA at the hotspots was 
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predicted to have affinity for three of the seven pro- 
teins (Gaf, Jra, and Max), but was bound by all seven. 
The hotspots were associated with increased expres- 
sion at neighboring genes, suggesting that they are 
functionally relevant. Subsequent ChIP studies in whole 
embryos have confirmed that such hotspots are a general 
feature of the Drosophila [3-5] and the C. elegans [6] ge- 
nomes. The TF colocalization phenomenon has also been 
observed in mammalian cells. An analysis of ChIP profiles 
for 13 TFs collected in mouse embryonic stem cells re- 
vealed extensive colocalization of these proteins along the 
genome [7]. Similarly, analysis of 89 sequence-specific TFs 
in a variety of human cell types [8] identified many HOT 
regions [2]. 

A number of mechanisms have been proposed to explain 
the observed co-enrichment across ChIP experiments. 
Chromatin loops could cross-link to multifunctional "tran- 
scription factories" or enhanceosomes [9]. Non-sequence- 
specific binding can also be driven by a locally permissive 
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chromatin structure [3,10]. The authors of the fly DamID 
study [1] argue against non-specific binding, because two 
non- endogenous proteins (mutant fly Bed consisting of 
only a DNA-binding domain, and yeast Gal4p consisting of 
only a DNA-binding domain) do not localize to the hot- 
spots, but rather to their predicted in vitro binding sites. 
Direct protein-protein interactions between the involved 
fly TFs have also not been observed, complicating any 
model involving a transcription factory. The authors of the 
mouse study [7], by contrast, suggest that the mouse hot- 
spots represent enhanceosomes, due to their ability to 
drive transcription in a luciferase assay and their recruit- 
ment of the p300 coactivator. A feature shared by both or- 
ganisms is that hotspots are associated with increased 
expression at neighboring genes, but are often located far 
from traditionally-defined proximal promoters. 

The present study was motivated by the fact that, al- 
though extensive genome-wide in vivo protein binding data 
has been collected for the yeast Saccharomyces cerevisiae 
[11-13], no analogous colocalization of sequence-specific 
regulators has been reported for this organism. Signifi- 
cantly, however, in the large-scale compendia by Lee et al. 
[12] and Harbison et al. [11], the authors subtracted, for 
each probe separately, the mean across all arrays in order 
to account for biases in the immunoprecipitation reaction. 
This normalization procedure was certainly appropriate 
given the goal of these studies, namely, to determine the 
specific transcriptional target genes of each individual tran- 
scription factor. However, it would also have largely re- 
moved any true collective genomic enrichment pattern 
shared by many TFs. This insight motivated us to perform 
a detailed re-analysis of the original microarray data in a 
manner that omitted the probe-specific normalization step. 
This revealed that a collective pattern of ChIP enrichment 
also exists in yeast. 

Unlike in higher eukaryotes, the collective enrichment 
patterns in yeast are not associated with sequence- 
predicted protein-DNA binding affinity for any of the TFs 
involved. Rather, sequence and functional analysis reveals 
that the most significant features of co-enriched probed 
regions are: (i) the extent of nucleosome depletion, (ii) ex- 
pression of proximal genes, and (iii) the proximity to non- 
coding RNA genes, the majority of which encode tRNAs 
and snoRNAs. Additionally, the co-enrichment hotspots 
are occupied chiefly in rich-media (YPD) conditions, 
while, strikingly, the phenomenon is abrogated in the ma- 
jority of environmental perturbation and stress conditions. 

Results 

Quantifying collective ChIP enrichment in rich media 
conditions 

First, we performed a detailed re-analysis of the raw ChlP- 
chip data from Lee et al. [12] and Harbison et al. [11], but 
without performing their normalization procedure across 



experiments (see Methods). To characterize the shared 
component of the ChIP profiles collected in rich media 
(YPD), we computed the median log 2 fold-enrichment 
(MLFE) across 195 TFs as a measure of co-enrichment for 
each probe. The distribution of MLFE across probes was 
skewed heavily to the right (Figure 1), a shared enrich- 
ment profile that was evident in the authors' original ana- 
lysis but not fully characterized. The re-analyzed ChIP 
landscapes were also more correlated with each other than 
the normalized profiles from the original paper (Figure 2). 
We proceeded to investigate the location of the co- 
enrichment phenomenon relative to genomic features. 

Collective enrichment is strongly associated with 
noncoding-RNA genes 

A first glance at the most highly co-enriched probed re- 
gions revealed a preponderance of telomeres and noncod- 
ing RNAs (ncRNAs) (Table 1). To systematically determine 
whether specific genomic features were associated with 
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Median log 2 fold-enrichment across 
rich media ChIP experiments (MLFE) 

Figure 1 ChIP co-enrichment. Distribution of TF ChIP co-enrichment 
across probes. Co-enrichment is quantified as median log 2 fold 
enrichment (MLFE) across all analyzed rich media experiments 
from Harbison et al. [1 1]. The distribution of the original normalized 
published data is in gray, and the distribution of the reanalyzed data is 
in red. 
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Figure 2 ChIP enrichment profiles from published and reanalyzed data. ChlP-chip enrichment profiles across all analyzed rich media experiments 
and correlations among them. An enrichment profile heatmap and correlatogram is shown for both the original normalized published data of 
Harbison et al. [1 1] and our reanalysis. TFs in all four matrices were sorted by their enrichment at ncRNA genes in the reanalyzed data; 
probes in the heatmaps were sorted by their median log 2 fold enrichment (MLFE) in the reanalyzed data. 



co-enrichment, we tested whether the distribution of 
MLFE for probes corresponding to each annotated 
genomic feature was different from that corresponding 
to the rest of the genome (Figure 3). The most signifi- 
cantly co-enriched were the 514 probes corresponding 
to ncRNA genes (difference of median fold enrichment 
AMLFE = 0.27; p = 6.9 x 10" 161 , Student's t-test; p < 
2.2 x 1(T 16 , Wilcoxon-Mann- Whitney test). The more 
specific ncRNA categories of tRNAs, snoRNAs, and 
snRNAs were all significantly co-enriched as well. 
There were not enough probes corresponding to rRNA 
genes to establish statistical significance. 

Probes were mapped to a feature if there was any over- 
lap between the probe and feature. For each probe, the 
co-occupancy was defined as the median log2 ChlP-chip 
fold enrichment (MFE) across all rich media experiments. 



For each feature, the probe family co-occupancy AA was 
defined as the difference in mean co-occupancy within 
each probe family and mean co-occupancy at all other 
probes. The />-value was determined using a t-test. Signifi- 
cant /^-values are highlighted. 

A subset of yeast tRNA genes have been demonstrated 
to colocalize to the nucleolus. We therefore asked whether 
TF co-enrichment is associated with nucleolar localization. 
We used the classification of yeast tRNA genes as nucle- 
olar or non-nucleolar based on a three-dimension model 
of the yeast genome derived from chromatin conformation 
capture data by Duan et al. [14]. However, we found no sig- 
nificant difference in rich media MLFE between the two 
sets of genes (t = 0.67, p = 0.51). Therefore, nucleolar and 
centromeric tRNA genes seem to participate in the collect- 
ive enrichment phenomenon to an equal degree. 
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Table 1 Probes with highest median ChlP-chip 
fold-enrichment (FE) across rich media experiments 

Probe Median FE Notable feature nearby? Distance (bp) 



*Probe sequence does not map uniquely to the genome. 

. List of probes with highest median ChlP-chip fold-enrichment (FE) across rich 

media experiments from Harbison et al. [1 1]. 



Evidence that collective enrichment is not due to 
technical artifact 

Because telomeres and tRNA genes are associated with 
repetitive elements [15,16] in addition to having a high 
genomic copy number, we suspected that their consist- 
ently high enrichment across experiments could be an 
artifact of cross-hybridization [17,18]. To test for this, 
we inspected spot intensities and performed a more 
finely-grained classification of probes (Figure 4; see 
Methods). We decided to exclude probes corresponding 
to telomeres or overlapping ncRNA genes by more than 
25 bp from the remainder of our analysis (see Methods). 

TF co-enrichment M was defined for each probe as 
the median log2 fold enrichment across all rich media 
ChlP-chip experiments, and the family AM as the differ- 
ence in mean M among probes in a family and all other 
probes. The jj-value was calculated using a i-test. Simi- 
larly, the absolute intensity A for each probe in each ex- 
periment was defined as the mean (Lowess-normalized) 
intensity between the red and green channels; the me- 
dian A was calculated across all experiments for each 
probe; and the family AA was reported as the difference 
in mean A among probes within a family and all other 
probes. Probe mapping and categories for comparison 
are as follows (see Methods for details of probe 
categorization): (A) Probes with high overlap vs. all other 
probes. (B) Probes with any overlap vs. all other probes. 
(C) Probes with low overlap or neighboring vs. non- 
neighboring probes (high overlap probes excluded from 
the analysis.) (D) Neighboring probes vs. non-neighboring 
probes (probes with any overlap excluded from the 
analysis.) Significant co-enrichment (AM) /7-values are 
highlighted yellow; significant intensity (AA) ^-values, 
which may signify cross-hybridization, are highlighted 
red. 

A plot of MLFE versus distance between the center of 
each probe and the center of the nearest ncRNA gene 
(Figure 5) shows a gradual and approximately exponen- 
tial decay with increasing distance. The decay length is 
similar to a typical IP fragment length [19]. By contrast, 
cross -hybridization would appear as spikes as a function 
of genomic position with no such decay around peaks, 
as was discussed by Orian and colleagues [20]. We con- 
clude that cross-hybridization is not responsible for the 
observed signal. 

Biases in IP efficiency and shearing based on chro- 
matin state have been shown to be important in the in- 
terpretation of ChIP experiments [23,24]. To check 
whether such biases affected immunoprecipitation or 
hybridization efficiency of ncRNA genes, we inspected 
control experiments that used no antibody or a nonspecific 
antibody [22]. We observed a weak depletion of ncRNA 
genes in the mock IP samples relative to the whole-cell ex- 
tract (no-antibody: AMLFE = -0.12; p = 2.9x 10~ 24 , t-test; 
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Figure 3 Comparison of TF co-occupancy in each probe family vs. all other probes. 



rabbit IgG: AMLFE = -0.13; p = 1.9 x 10 , t-test; Figure 5). 
These controls suggest that any immunoprecipitation bias 
at ncRNA genes would cause us to underestimate rather 
than overestimate the magnitude of the hotspot effect. 

The ChlP-chip experiments that we re-analyzed for 
this study all relied on myc-tagged proteins. In humans, 
the c-Myc protein is localized to the nucleolus, raising 
the possibility that myc-tagged proteins in the ChIP ex- 
periment would be artificially biased towards tRNAs 
genes, some of which cluster in the nucleolus [25-27]. 
To rule out this possibility, we performed the same ana- 
lysis on a set of ChlP-chip data that employed FLAG 
tagging rather than myc tagging, and high-density tiling 
probes [21]. The kinases assayed in this experiment 
again showed shared IP at ncRNA genes and exponential 
decay with increasing distance between the probed re- 
gion and the ncRNA gene, and a comparable quantita- 
tive enrichment near ncRNA genes (AMLFE = 0.36; p = 
2.1 x 10~ 116 , t-test; Fi gure 5). Taken together, the above 
results make it unlikely that shared IP is dues to a tag- 
specific artifact. 



For most TFs, in vitro DNA binding specificity is a poor 
predictor of in vivo occupancy 

The canonical view holds that the DNA-binding domain 
(DBD) of a TF is responsible for its recruitment to specific 
sequences in the genome. However, highly specific yet 
DBD-independent recruitment to sites of co-occupancy 
has been demonstrated using recombinant Bicoid protein 
in Drosophila [1]. The landscape of co-enrichment that 
we have characterized represents an independent contri- 
bution to the ChIP enrichment landscape of any given TF, 
which complements the sequence-specific targeting via its 
DBD. We were interested in contrasting these two predic- 
tors and quantifying the extent to which each of them 



contributes to the overall genomic enrichment profile for 
a TF. To this end, we calculated the Pearson correlation, 
across all probes, between the log 2 fold enrichment (LFE) 
for each TF and (i) the median log 2 fold-enrichment 
(MLFE) over all other TFs profiled in rich media, and (ii) 
the regional in vitro binding affinity predicted from DNA 
sequence using a position-specific affinity matrix for the 
TF from protein-binding microarray (PBM) data from 
Badis et al. [28] and Zhu et al. [29] (Figure 6). For almost 
all TFs, the correlation with MLFE is significant (mean 
value of r = 0.31), indicating that the co-enrichment signal 
contributes to their IP profile to a significant extent. A 
notable exception is Yaplp, whose LFE is significantly 
anticorrelated with the MLFE of all of other factors. For a 
smaller number of TFs, LFE correlates with predicted af- 
finity, but always to a lesser extent than with MLFE (mean 
r = 0.04), with the exception of Abflp. 



Co-enriched loci are associated with nucleosome 
depletion and high expression 

To explore other relationships between genome function 
and TF co-enrichment, we looked for Gene Ontology 
(GO) categories of proximal genes (Table 2). For every 
GO category, we compared the distribution of MLFE 
within probes corresponding to promoters of genes in 
that category with the rest of the probes. The most 
enriched protein functions are for translation (transla- 
tional elongation, t = 13.8, p = 7.7 x 10~ ; cytoplasmic 
translation, t = 13.8, p = 1.1 x 10~ 42 ) and accordingly, ribo- 
somal proteins as a whole are strongly enriched (t = 10.4, 
p = 3.1 x 10~ ). Because ribosomal protein (RP) pro- 
moters are known to be particularly active [30], we 
were interested in whether expression globally corre- 
lates with co-enrichment, and found that it does (Pearson 
r = 0.17, p = 1.1 x 10 40 ; Figure 7A). We also found that 
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Figure 4 (See legend on next page.) 
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(See figure on previous page.) 

Figure 4 Comparison of TF co-occupancy and absolute intensity among selected probe families and sub-families and different 
mapping criteria. TF co-enrichment M was defined for each probe as the median log2 fold enrichment across all rich media ChlP-chip 
experiments, and the family as the difference in mean M among probes in a family and all other probes. The p-value was calculated using a 
t-test. Similarly, the absolute intensity A for each probe in each experiment was defined as the mean (Lowess-normalized) intensity between the 
red and green channels; the median A was calculated across all experiments for each probe; and the family was reported as the difference in 
mean A among probes within a family and all other probes. Probe mapping and categories for comparison are as follows (see Methods for 
details of probe categorization): (A) Probes with high overlap vs. all other probes. (B) Probes with any overlap vs. all other probes. (C) Probes with 
low overlap or neighboring vs. non-neighboring probes (high overlap probes excluded from the analysis). (D) Neighboring probes vs. non- 
neighboring probes (probes with any overlap excluded from the analysis). Significant co-enrichment p-values are highlighted yellow; significant 
intensity p-values, which may signify cross-hybridization, are highlighted red. 



co-enrichment is even more strongly anticorrelated with 
nucleosome occupancy (Pearson r = -0.31, p = 1.3 x 1CT ; 
Figure 7B). 

We were also interested in whether the TF co- 
enrichment profile was correlated with affinity for TFs. 
We calculated the predicted affinity of each probe for a 
compendium of TFs. Among TF affinities predicted from 
protein binding microarray (PBM) data, only affinity for 
Rsc30p, Rsc3p, and Raplp correlated with MLFE (Pear- 
son r = 0.07, r = 0.06, and r = 0.06, respectively). Binding 
by these factors has previously been shown to drive nu- 
cleosome depletion at RP promoters [28,32], consistent 
with the correlation with nucleosome depletion de- 
scribed above. 

Collective enrichment at ncRNA genes is largely 
eliminated in perturbed conditions 

So far, our analysis has been restricted to rich media 
(YPD) conditions, providing a uniform chromatin con- 
text for comparison across factors. Examining ncRNA 
loci in experimental perturbation ("stress") conditions 
reveals dramatically reduced co-enrichment (Figures 8 
and 5). Using the median TF enrichment across all non- 
YPD conditions, the elevation in co-enrichment at 
ncRNA genes drops from 0.25 to 0.03. To further inves- 
tigate this general observation by focusing on ChIP en- 
richment of individual TFs in their rich media and 
stress conditions. For each particular stress-TF com- 
bination (i.e., each experiment), we calculated the en- 
richment at ncRNA genes relative to all other probes 
(Figure 9). As expected from our pooled analysis, in 
the majority of stress conditions the enrichment at 
ncRNA genes is greatly reduced. For two TFs, viz. 
Ksslp and Gal4p, ncRNA genes are preferentially ChIP 
enriched in YPD, while in stress the enrichment at 
ncRNA genes is lower than elsewhere in the genome. 
Ksslp shows a negative relative occupancy of ncRNA 
genes in alpha mating factor and 1-butanol conditions. 
Gal4p shows decreased preferential enrichment at 
ncRNA genes in galactose and avoidance of these loci 
in raffinose. 

Criterion for probe mapping is the same as in Figure 4C: 
Probes with low overlap or neighboring vs. non- 



neighboring probes (high overlap probes excluded 
from the analysis.) Significant jj-values are highlighted 
in yellow. 

Interestingly, those TFs that do not participate in ChIP 
co-enrichment at ncRNA genes as strongly in rich media 
conditions are more likely to be ChlP-enriched at 
ncRNA genes in other conditions. The most notable ex- 
ample of this is Stel2p, which is enriched at ncRNA 
genes upon exposure to alpha mating factor, but not in 
the absence of alpha factor or in the presence of 1- 
butanol. Diglp, which is also associated with the mating 
response, behaves differently: it is not enriched at ncRNA 
genes in rich media, and is also not enriched at them in 
the presence of alpha mating factor and 1-butanol. Finally, 
among the other TFs that exhibit ncRNA depletion in rich 
media, Mot3p shows a loss of this depletion in the pres- 
ence of hydrogen peroxide or sulfometuron methyl. The 
fact that enrichment at ncRNA genes is both factor and 
condition specific supports that the ChIP co-enrichment 
is not solely determined by the chromatin state at the co- 
enriched loci, and is dependent on the identity and activity 
of the binding proteins. 

Co-enrichment during oxidative stress is reduced, not 
moved to other loci 

To directly compare co-enrichment between YPD and 
perturbed conditions, we looked at the hydrogen perox- 
ide condition, which has the highest number of factors 
assayed in common with YPD. We then calculated 
MLFE in each condition using only the subset of factors 
that was assayed in both, and performed GO analysis 
(Figure 10) and expression correlation analysis (Figure 11). 
Analyzing this subset, we again found the strongest co- 
enrichment at promoters of ribosome-associated genes, in 
both YPD and hydrogen peroxide conditions (Figure 10). 
However, the enrichment was greatly reduced during 
oxidative stress, to the extent that only one GO cat- 
egory (small ribosomal subunit; see highlighted row) in 
the H 2 0 2 condition showed an enrichment surpassing 
a threshold of p < (0.05/748 categories). In addition, 
the correlation between co-enrichment and expres- 
sion is much weaker during oxidative stress (YPD 
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Figure 5 ChIP co-enrichment at ncRNA genes. TF co-enrichment, 
defined as the median log 2 ChlP-chip fold enrichment (MLFE), as a 
function of distance to the nearest ncRNA gene. Plotted in black is a 
fit to y = b 0 + b,e (1/d) . Top to bottom: (A) Co-enrichment across YPD 
experiments from Harbison et al. [1 1]: fa 0 = -0.03, fa, =0.44, d = 316.8; 
p for each parameter < 2 x 10~ 16 ; r = 0.27. (B) Co-enrichment across 
non-YPD experiments from Harbison et al. [1 1]: b 0 = -0.004, fa, =0.09, 
d = 1 543; p for each parameter > 0.04; r 2 = 0.004. (C) Co-enrichment 
across YPD experiments from Pokholok et al. [21]: fa 0 = -0.01, fa, =0.67, 
d= 1 48.4; p for each parameter < 2 x 10~ 16 ; r 2 = 0.04. (D) ChlP-chip log 2 
fold enrichment for no-antibody control from Pokholok et al. [22]; 
b 0 = -0.008, fa, = -0.1 3, d = 504.2; p for each parameter < 4.4 x 10~ 10 ; 
r 2 = 0.004. (E) ChlP-chip log 2 fold enrichment for anti-rabbit IgG control 
from Pokholok et al. [23]: b 0 = -0.02, fa, = -0.1 5, d = 41 7.1 ; p for each 
parameter < 4.9 x 1 0" B ; r 2 = 0.004. 



r = 0.17, slope = 0.02 ± 0.002, p = 1.37 x 10" db ; H 2 O z 
r = 0.05, slope = .005 ± 0.001, p = 3.1 x 10 ~ 4 ). 

TFs in this analysis were restricted to the subset shared 
between YPD and H2O2 conditions, and the top GO en- 
richments are shown for YPD. The highlighted row is the 
only category that is significant in H 2 0 2 after Bonferroni 
correction. Expression values were obtained from Huebert 
and Gasch (2012) as described in Methods. 

Validation by ChlP-Seq 

For validation purposes, we compared three Stel2p 
ChlP-Seq datasets, one of which was performed in pseu- 
dohyphal conditions and two in exposure to alpha mat- 
ing factor [33,34]. Both showed enrichment near ncRNA 
genes, although the magnitude was greater during ex- 
posure to alpha mating factor, consistent with the exper- 
iments of Harbison et al. [11] (Figure 12). These data 
further support that the hotspot effect is not an artifact 
of microarray technology. 

Discussion 

Other evidence for TF colocalization in the yeast 
literature 

Our reanalysis of the ChlP-chip compendia of Lee et al. 
[12] and Harbison et al. [11] has revealed co-enrichment 
of yeast TFs at ncRNA genes. In a more recent study, 
Venters and colleagues used low-density tiling microar- 
rays to assay the occupancy of a broader range of factors 
[13]. Because of differences in probe design, their occu- 
pancy data are not directly comparable to those of Lee 
et al. [12] and Harbison et al. [11], and are not suited to 
the interrogation of transcribed regions; however, the au- 
thors noted a surprising association of Pol Il-associated 
factors with tRNA promoters. Two recent studies in 
yeast have recognized non-canonical binding in light of 
the known biological roles of TFs. Fan and Struhl [35] 
found condition-specific Mediator binding over many 
gene bodies, rather than upstream promoter regions 
where it is known to act; they argued based on the low 
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(See figure on previous page.) 

Figure 6 Correlation of ChIP enrichment for individual factors with co-enrichment and predicted affinity. Left to right: (A) Shared 
enrichment for each factor measured as the Pearson correlation between the TF's genomewide enrichment landscape (in terms of log 2 fold 
enrichment) and the median log 2 fold enrichment (MLFE) across all other rich media ChlP-chip experiments. (B) Sequence-specific enrichment for 
each factor measured as the Pearson correlation between the TF's genomewide enrichment and the predicted genomewide affinity for that TF 
from the PBM data of either Badis et al. [28] or Zhu et al. [29] (stronger correlation shown when TF is in both datasets). (C) Scatter plots showing 
the correlations described above (ChIP enrichment vs. co-enrichment and ChIP enrichment vs. affinity) for each of four factors: Met31p, Reb1 p, 
Abfl p, and Yaplp. 



enrichment and reproducibility that these targets repre- 
sent indirect binding due to chromatin state. Teytelman 
and colleagues [36], motivated by finding components of 
the Sir silencing complex at actively-transcribed regions, 
found that exogenously expressed GFP also immunopreci- 
pitated with these regions in a condition-specific manner. 

Possible mechanisms underlying dynamic co-enrichment 
at ncRNA genes 

Genomic recruitment of transcription factors is usually con- 
ceptualized as binding of the DNA-binding domain of the 
protein to high-affinity consensus sequences in the DNA, 
contingent on the local accessibility of the DNA. Our find- 
ing that many studied yeast transcription factors preferen- 
tially immunoprecipitate with nucleosome-depleted DNA is 
consistent with previous observations that TFs will nonspe- 
cifically bind to naked DNA at a low level [37]. Within the 
nucleus, nucleosome-depleted regions may most closely 

Table 2 Gene Ontology (GO) enrichment analysis of 
genes by level of TF co-enrichment at neighboring 
probes 

GO Category p-value t 

13.84 
13.81 
13.04 
11.80 
11.65 
10.43 
10.08 
9.52 
9.21 
9.18 
9.08 
8.34 
7.15 
6.04 
6.04 
5.78 
5.66 
5.54 
5.44 



resemble naked DNA in vitro, in which case they ought 
to display a higher level of nonspecific binding relative 
to nucleosome-occupied and heterochromatic regions. 
However, we have shown here that the hotspot phe- 
nomenon can only be partly explained in terms of 
chromatin accessibility, because even when using the 
same antibody, the ChIP enrichment at hotspots de- 
pends on which TF carries the affinity tag. This is con- 
sistent with the recent observation in fly Kc cells [38] 
and in cultured human cells [39] that the optimal 
chromatin context - i.e., the chromatin type for which 
the highest degree of occupancy is observed at a given 
level of sequence-predicted DNA binding affinity - is 
different for each TF, and that none of the chromatin 
states is globally permissive. 

Both the ChIP and DamID method can detect TFs that 
are near DNA but not necessarily contacting it. Conse- 
quently, the observed co-enrichment signal could be due 
to the proximity of probed regions to the TFs rather 
than due to direct interactions with them. Indeed, for 
individual yeast TFs, indirect interactions have been 
proposed in order to account for the poor correlation 
between in vitro sequence specificity as measured 
by protein binding microarrays (PBMs) and in vivo 
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Figure 7 Correlation of TF co-enrichment with gene expression 

and nucleosome occupancy. (L) Scatter plot of TF co-enrichment 

vs. gene expression in YPD from Huebert and Gasch [31]. Each point 

is the expression level for a gene and the co-enrichment (MLFE) of 

neighboring regions; expression values are log 2 of quantile normalized 

intensity values. Plotted as a black line is a fit of all the data to a linear 

model (r = 0.1 7). (R) Scatter plot of TF co-enrichment vs. nucleosome 

occupancy by nucleosome ChIP from Bernstein et al. [32]. Each point is 

a probed region assayed both by Harbison et al. and Bernstein et al. 

Plotted as a line is a fit of all the data to a linear model {r= -0.31). 
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Figure 8 Comparison of TF co-enrichment for ncRNA families in rich media and stress conditions. 



occupancy as measured by ChlP-chip [40]. Fly and 
mouse hotspots have been hypothesized to reflect both 
direct interactions mediated by the DNA-binding do- 
main of certain TFs and indirect, protein-protein inter- 
actions involving the other co-enriched TFs [1,7]. Our 
sequence analysis does not provide any evidence of direct 
sequence-specific interactions with TFs. Nucleosome de- 
pletion and proximity to ncRNA genes both predict co- 
enrichment significantly better than local regional bind- 
ing affinity predicted from DNA sequences using ei- 
ther known binding specificities or de novo motif 
discovery. The co-enrichment could also be the result 
of competitive binding by different TFs in different 
subsets of cells and at different times, as has been sug- 
gested by recent work [41,42]. 

Several lines of cytological evidence from mammalian 
cells suggest that transcription by polymerase II occurs 
at nuclear foci comprising many polymerase molecules 
and transcription factors, termed "transcription factor- 
ies" [9]. If such factories exist in yeast, it is conceivable 
that nucleosome-free regions and ncRNA genes - which 
are associated with high levels of transcription (by poly- 
merase II and I/III, respectively) - are in close proximity 
to multiple TFs as a result of transcription factories. In- 
deed, it was recentiy discovered that Pol II-associated tran- 
scription factors tightiy associate with Pol Ill-transcribed 
genes in human cells [43]. 

Conclusions 

Our results show that the median enrichment across all 
TFs is far more predictive of the ChIP landscape of a 
typical individual yeast TF than DNA sequence is. This 
agrees with a recent study of the interaction between 
chromatin accessibility and sequence specificity [10]. 
While the normalized enrichment data of the original 
yeast ChlP-chip compendia [11,12] have proven im- 
mensely valuable for understanding and modeling regula- 
tory networks, any other ChIP experiment not subjected 



to the same normalization will display both sequence- 
specific as well as hotspot targeting. As genomic protein 
occupancy mapping technology increases in resolution 
and sensitivity, understanding the structure, origin, and 
possible function of co-enrichment hotspots will become 
increasingly important to interpreting the data they 
generate. 

Methods 

Processing of raw ChlP-chip data 

The original raw ChlP-chip data [11,12] were obtained 
from ArrayExpress (http://www.ebi.ac.uk/arrayexpress/) 
using accession numbers E-WMIT-1 and E-WMIT-10, 
respectively. Protocol information for each array (which 
dye was IP vs. WCE, experimental conditions, etc.) was 
extracted from the files E-WMIT-l.sdrf.txt and 
E-WMIT-10.sdrf.txt, available in the directory ftp://ftp.ebi. 
ac.uk/pub/databases/microarray/data/experiment/WMIT/. 
Raw intensity information was downloaded from the tab- 
delimited text files in E- WMIT-l.raw.zip and E-WMIT- 
10.raw.zip available in the FTP directory specified within 
the aforementioned text files. The column headers in all 
of these text files were found to be corrupted. Therefore, 
they were split between nine different formats. Each for- 
mat was manually curated to locate the correct median 
foreground and background red and green intensity col- 
umns, using the presence of a background-subtracted log 
ratio column as a validation. Four of the experimental 
conditions had array data in the database that also had 
corrupted rows, where the number of columns was not 
consistent throughout the whole file; data associated with 
these conditions (Dal81p sulfometuron methyl, Arg80p 
sulfometuron methyl, Maclp hydrogen peroxide, and 
Imel hydrogen peroxide) were discarded. Raw intensities 
were loaded into R and Loess normalization was per- 
formed on each array (to account for dye-specific re- 
sponse functions) using the normalize WithinArrays 
function of the limma package [44], resulting in an M 
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Figure 9 Condition specificity of co-enrichment at ncRNA genes. (A) Each row is a TF, and experimental conditions for that TF are plotted 
on the same row with letters indicating the condition. Conditions are: "Y", rich media; "S", sulfometuron methyl; "R", rapamycin; "H", hydrogen 
peroxide; "1", 1-butanol; "A", succinic acid; "G", galactose; "V", vitamin deprived medium; "M", alpha mating factor; "F", raffinose; and "P", phosphate 
deprived medium. ChIP enrichment at ncRNA genes is expressed as the difference between the mean log 2 fold enrichment of ncRNA gene 
probes and the mean log 2 fold enrichment of all other probes. (B) Leu3p, Ste12p, and Mot3p enrichment at ncRNA genes in rich media vs. 
sulfometuron methyl treatment. For each factor and condition, an empirical cumulative distribution function is shown contrasting the distribution 
in log 2 fold enrichment (FE) for ncRNA gene probes and all other probes. 
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Figure 10 Gene Ontology (GO) enrichment analysis of genes by level of TF co-enrichment in both YPD and H 2 0 2 conditions at 
neighboring probes. 



(relative intensity) and A (absolute intensity) value for 
each spot on each array. A number of the arrays were 
found to have very low variance in their log ratios; ar- 
rays with a variance in M after Loess normalization 
less than 0.05 were discarded. Four summary values 
were calculated for each probe: a median log ratio (M) 
and intensity (A) signal across all rich-media (YPD) ar- 
rays, and a median log ratio (M) and intensity (A) 
across all stress arrays. Additionally, for every experi- 
mental condition for which multiple replicates were 
available, a median M and A value across replicates 
was calculated. The same processing was applied to 
ArrayExpress data from assaying rabbit IgG control, 




Expression in YPD 
(by huerbert and Gasch 2012) 



Expression in H202 
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Figure 1 1 Correlation of TF co-enrichment with gene expression 
in different conditions. Scatter plots of TF co-enrichment vs. gene 
expression in YPD from Huebert and Gasch [31]. In each case, the 
co-enrichment (MLFE) is defined by using only data from TFs 
assayed in both YPD and H 2 0 2 . Each point is the expression level 
for a gene and the co-enrichment (MLFE) of neighboring regions; 
expression values are log 2 of quantile normalized intensity values. 
(L) YPD, (R) H 2 0 2 . 
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Figure 12 Validation using ChlP-Seq data. Density of Ste12p 
ChlP-seq reads relative to the genome-wide coverage for the two 
parents tested under exposure to alpha mating factor in Zheng 
et al. [34] and the strain tested under pseudohyphal growth 
conditions in Lefrancois et al. [33]. 



Ward et al. BMC Genomics 2014, 15:494 
http://www.biomedcentral.com/1471-2164/15/494 



Page 14 of 16 



no-antibody control, and kinase occupancy by tiling 
array [21,22], which we used for validation. 

Genome annotation 

The genomic coordinates of probes were mapped to the 
chromosome sequences contained in the GFF-formatted 
sequence and annotation available from the Saccharo- 
myces Genome Database (SGD) [45], dated 21 April 
2007, and the distance from each probe to the nearest 
annotated genomic feature of each type was calculated. 
More specifically, both a gap (defined as zero if overlap- 
ping, and otherwise the distance between the edge of a 
probe and the edge of a feature) and an overlap were 
calculated. The GFF file was further parsed to divide 
tRNAs into spliced vs. intronless and Ty-flanked vs. Ty- 
absent tRNAs, and to divide snoRNAs into H/ACA-box 
vs. C/D-box and intron-derived vs. extragenic snoRNAs. 
The array design includes both probes that are centered 
on tRNAs, and probes that only overlap partially with 
tRNAs. For each category of genomic feature, we defined 
the probes that were centered on the feature ("high over- 
lap" > 25 bp), those with a partial overlap ("low over- 
lap" < 25 bp), those that were neighboring ("neighbors," 
no overlap, gap between 1 and 100 bp), and all other 
probes. 

Annotation-specific inspection of intensities to test for 
cross-hybridization 

In order to test for cross-hybridization, we inspected 
median intensities and performed i-tests for probes cor- 
responding to each class and sub-class of features de- 
fined above (Figure 4). While probes corresponding to 
telomeres had higher median log2 fold enrichment (MLFE; 
AMLFE = 0.41, p = 8.1 x 10~ 5 ), they also had higher median 
intensities (AA = 2.23,/? = 1.8 x 10~ 4 ). Therefore, we ex- 
cluded them from our analysis. Additionally, many families 
of ncRNA probes had lower median intensities, presumably 
due to their relatively short length. Using a conserva- 
tive criterion for classification (Figure 4D), discarding 
probes overlapping ncRNA genes and only considering 
neighboring probes, still results in the co-occupancy effect 
among the neighboring probes, suggesting that neither 
the high copy number of tRNAs nor of their associated 
Ty elements are responsible for the co-occupancy ef- 
fect. We settled on a criterion that excludes any probes 
showing any overlap with telomeres or high overlap 
with ncRNA genes from the remainder of our analyses, 
but we did include probes neighboring ncRNA genes 
and those with a low overlap with ncRNA genes in our 
definition of ncRNA gene probes (the criterion used in 
Figure 4C). A similar criterion was employed in a RNA 
polymerase III location study using a similar ChlP- 
chip array design [46]. 



Comparison of occupancy at annotated targets and at 
ncRNA genes 

Annotated targets for each TF were defined as probes 
that overlapped or were neighboring (within 100 bp of) 
regions reported by Maclsaac et al. [47] within their 
p-wzhxe threshold of 0.005. After discarding probes 
that were annotated both as ncRNA probes (accord- 
ing to the criterion described above) and as TF tar- 
gets, we compared the mean log 2 fold enrichment 
among ncRNA probes and among annotated targets 
with that of all other probes. A significant difference 
in means was defined as a i-test passing a p-value 
threshold of 0.05, Bonferroni corrected for the num- 
ber of tests. 

Correlation with sequence-predicted binding affinity, 
nucleosome affinity, and gene expression 

The affinity of each probed region for TFs was calculated 
using two published libraries of protein binding microarray 
(PBM)-derived position weight matrices (PWMs) [28,29]. 
The PWMs were converted to position-specific affinity 
matrices (PSAMs) and probe-TF affinities were calculated 
using the AffinityProfile utility in the MatrixREDUCE 
package as described previously [48]. The Pearson correl- 
ation between predicted affinity and MFE was then calcu- 
lated. Nucleosome occupancy measurements by ChlP-chip 
were obtained from Bernstein et al. [32]. For each probed 
region, the median log ratio across all assayed histone 
subunits was used. The Pearson correlation between 
predicted affinity and nucleosome occupancy was then 
calculated. Gene expression data from both YPD and 
the 30 minutes treatment with 0.4 mM concentration 
H 2 0 2 condition were obtained from [31], and probes 
were assigned to genes using S. cerevisiae chromo- 
somal features (Genome Version R64-1-1) annotated 
in Saccharomyces genome database (SGD). In cases of 
divergent promoters, value was assigned to both genes. 
Probe intensities were quantile normalized using MATLAB 
bioinformatics toolbox. 

Gene Ontology analysis 

Functional enrichment of probes by Gene Ontology (GO) 
categories [49] was determined using a MATLAB imple- 
mentation of the T-profiler algorithm [50]. The GO an- 
notation was downloaded from SGD (Gene Ontology 
Consortium Validation Date: 01/25/2014). 

Condition specific analyses 

Condition specificity was calculated as follows: For 
each YPD experiment, we calculated the Pearson cor- 
relation between the TF's occupancy and the median 
occupancy across all other rich media TF experi- 
ments, and also the correlation between its occupancy 
and its predicted affinity as predicted from PBM data. 
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TFs for which no PBM-derived matrix was available 
were excluded. In cases for which two matrices were 
available - from both Badis et al. [28] and Zhu et al. 
[29] - the one with the best correlation to the ChIP 
occupancy was used. 

ChlP-seq analysis 

ChlP-seq data from Lefrancois et al. [33] and Zheng et al. 
[34] were downloaded from Gene Expression Omnibus 
(http://www.ncbi.nlm.nih.gov/geo). These data include 
mapped peaks, but not genome-wide mapping of reads; 
therefore, read alignment results from ELAND were 
downloaded and processed using MACS [51] as de- 
scribed by the authors in order to obtain a genome-wide 
landscape of binding, in 10-bp bins. Distances from these 
bins to ncRNA genes were measured using the SGD gen- 
ome annotation described above and BEDTools [52] . 
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